18 research outputs found

    Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models

    Get PDF
    Motion is one of the strongest cues available for segmentation. While motion segmentation finds wide ranging applications in object detection, tracking, surveillance, robotics, image and video compression, scene reconstruction, video editing, and so on, it faces various challenges such as accurate motion recovery from noisy data, varying complexity of the models required to describe the computed image motion, the dynamic nature of the scene that may include a large number of independently moving objects undergoing occlusions, and the need to make high-level decisions while dealing with long image sequences. Keeping the sparse point features as the pivotal point, this thesis presents three distinct approaches that address some of the above mentioned motion segmentation challenges. The first part deals with the detection and tracking of sparse point features in image sequences. A framework is proposed where point features can be tracked jointly. Traditionally, sparse features have been tracked independently of one another. Combining the ideas from Lucas-Kanade and Horn-Schunck, this thesis presents a technique in which the estimated motion of a feature is influenced by the motion of the neighboring features. The joint feature tracking algorithm leads to an improved tracking performance over the standard Lucas-Kanade based tracking approach, especially while tracking features in untextured regions. The second part is related to motion segmentation using sparse point feature trajectories. The approach utilizes a spatially constrained mixture model framework and a greedy EM algorithm to group point features. In contrast to previous work, the algorithm is incremental in nature and allows for an arbitrary number of objects traveling at different relative speeds to be segmented, thus eliminating the need for an explicit initialization of the number of groups. The primary parameter used by the algorithm is the amount of evidence that must be accumulated before the features are grouped. A statistical goodness-of-fit test monitors the change in the motion parameters of a group over time in order to automatically update the reference frame. The approach works in real time and is able to segment various challenging sequences captured from still and moving cameras that contain multiple independently moving objects and motion blur. The third part of this thesis deals with the use of specialized models for motion segmentation. The articulated human motion is chosen as a representative example that requires a complex model to be accurately described. A motion-based approach for segmentation, tracking, and pose estimation of articulated bodies is presented. The human body is represented using the trajectories of a number of sparse points. A novel motion descriptor encodes the spatial relationships of the motion vectors representing various parts of the person and can discriminate between articulated and non-articulated motions, as well as between various pose and view angles. Furthermore, a nearest neighbor search for the closest motion descriptor from the labeled training data consisting of the human gait cycle in multiple views is performed, and this distance is fed to a Hidden Markov Model defined over multiple poses and viewpoints to obtain temporally consistent pose estimates. Experimental results on various sequences of walking subjects with multiple viewpoints and scale demonstrate the effectiveness of the approach. In particular, the purely motion based approach is able to track people in night-time sequences, even when the appearance based cues are not available. Finally, an application of image segmentation is presented in the context of iris segmentation. Iris is a widely used biometric for recognition and is known to be highly accurate if the segmentation of the iris region is near perfect. Non-ideal situations arise when the iris undergoes occlusion by eyelashes or eyelids, or the overall quality of the segmented iris is affected by illumination changes, or due to out-of-plane rotation of the eye. The proposed iris segmentation approach combines the appearance and the geometry of the eye to segment iris regions from non-ideal images. The image is modeled as a Markov random field, and a graph cuts based energy minimization algorithm is applied to label the pixels either as eyelashes, pupil, iris, or background using texture and image intensity information. The iris shape is modeled as an ellipse and is used to refine the pixel based segmentation. The results indicate the effectiveness of the segmentation algorithm in handling non-ideal iris images

    Field evaluation of a mobile app for assisting blind and visually impaired travelers to find bus stops

    Full text link
    Purpose: It is reported that there can be considerable gaps due to GPS inaccuracy and mapping errors if blind and visually impaired (BVI) travelers rely on digital maps to go to their desired bus stops. We evaluated the ability of a mobile app, All_Aboard, to guide BVI travelers precisely to the bus-stops. Methods: The All_Aboard app detected bus-stop signs in real-time via smartphone camera using a neural network model, and provided distance coded audio feedback to help localize the detected sign. BVI individuals used the All_Aboard and Google Maps app to localize 10 bus-stop locations in Boston downtown and another 10 in a sub-urban area. For each bus stop, the subjects used the apps to navigate as close as possible to the physical bus-stop sign, starting from 30 to 50 meters away. The outcome measures were success rate and gap distance between the app-indicated location and the actual physical location of the bus stop. Results: The study was conducted with 24 legally blind participants (mean age [SD]: 51[14] years; 11 (46%) Female). The success rate of the All_Aboard app (91%) was significantly higher than the Google Maps (52%, p<0.001). The gap distance when using the All_Aboard app was significantly lower (mean [95%CI]: 1.8 [1.2-2.3] meters) compared to the Google Maps (7 [6.5-7.5] meters; p<0.001). Conclusion: The All_Aboard app localizes bus stops more accurately and reliably than GPS-based smartphone navigation options in real-world environments

    Mobile gaze tracking system for outdoor walking behavioral studies

    Get PDF
    Most gaze tracking techniques estimate gaze points on screens, on scene images, or in confined spaces. Tracking of gaze in open-world coordinates, especially in walking situations, has rarely been addressed. We use a headmounted eye tracker combined with two inertial measurement units (IMU) to track gaze orientation relative to the heading direction in outdoor walking. Head movements relative to the body are measured by the difference in output between the IMUs on the head and body trunk. The use of the IMU pair reduces the impact of environmental interference on each sensor. The system was tested in busy urban areas and allowed drift compensation for long (up to 18 min) gaze recording. Comparison with ground truth revealed an average error of 3.38 while walking straight segments. The range of gaze scanning in walking is frequently larger than the estimation error by about one order of magnitude. Our proposed method was also tested with real cases of natural walking and it was found to be suitable for the evaluation of gaze behaviors in outdoor environments

    Motion segmentation at any speed

    No full text
    We present an incremental approach to motion segmentation. Feature points are detected and tracked throughout an image sequence, and the features are grouped using a region-growing algorithm with an affine motion model. The primary parameter used by the algorithm is the amount of evidence that must accumulate before features are grouped. Contrasted with previous work, the algorithm allows for a variable number of image frames to affect the decision process, thus enabling objects to be detected independently of their velocity in the image. Procedures are presented for grouping features, measuring the consistency of the resulting groups, assimilating new features into existing groups, and splitting groups over time. Experimental results on a number of challenging image sequences demonstrate the effectiveness of the technique.

    A Hardware-Friendly Optical Flow-Based Time-to-Collision Estimation Algorithm

    No full text
    This work proposes a hardware-friendly, dense optical flow-based Time-to-Collision (TTC) estimation algorithm intended to be deployed on smart video sensors for collision avoidance. The algorithm optimized for hardware first extracts biological visual motion features (motion energies), and then utilizes a Random Forests regressor to predict robust and dense optical flow. Finally, TTC is reliably estimated from the divergence of the optical flow field. This algorithm involves only feed-forward data flows with simple pixel-level operations, and hence has inherent parallelism for hardware acceleration. The algorithm offers good scalability, allowing for flexible tradeoffs among estimation accuracy, processing speed and hardware resource. Experimental evaluation shows that the accuracy of the optical flow estimation is improved due to the use of Random Forests compared to existing voting-based approaches. Furthermore, results show that estimated TTC values by the algorithm closely follow the ground truth. The specifics of the hardware design to implement the algorithm on a real-time embedded system are laid out

    Non-Ideal Iris Segmentation Using Graph Cuts

    No full text
    A non-ideal iris segmentation approach using graph cuts is presented. Unlike many existing algorithms for iris localization which extensively utilize eye geometry, the proposed approach is predominantly based on image intensities. In a step-wise procedure, first eyelashes are segmented from the input images using image texture, then the iris is segmented using grayscale information, followed by a postprocessing step that utilizes eye geometry to refine the results. A preprocessing step removes specular reflections in the iris, and image gradients in a pixel neighborhood are used to compute texture. The image is modeled as a Markov random field, and a graph cut based energy minimization algorithm [2] is used to separate textured and untextured regions for eyelash segmentation, as well as to segment the pupil, iris, and background using pixel intensity values. The algorithm is automatic, unsupervised, and efficient at producing smooth segmentation regions on many non-ideal iris images. A comparison of the estimated iris region parameters with the ground truth data is provided. 1